Parsing preferences with Lexicalized Tree Adjoining Grammars : exploiting the derivation tree
نویسنده
چکیده
Since Kimball (73) parsing preference principles such as "Right association" (RA) and "Minimal attachment" (MA) are often formulated with respect to constituent trees. We present 3 preference principles based on "derivation trees" within the framework of LTAGs. We argue they remedy some shortcomings of the former approaches and account for widely accepted heuristics (e.g. argument/modifier, idioms...). Introduction The inherent characteristics of LTAGs (i.e. lexicalization, adjunction, an extended domain of locality and "mildly-context sensitive" power) makes it attractive to Natural Language Processing : LTAGs are parsable in polynomial time and allow an elegant and psycholinguistically plausible representation of natural language1. Large coverage grammars were developed for English (Xtag group (95)) and French (Abeille (91)). Unfortunately, "large" grammars yield high ambiguity rates : Doran & al. (94) report 7.46 parses / sentence on a WSJ corpus of 18730 sentences using a wide coverage English grammar. Srinivas & al. (95) formulate domain independent heuristics to rank parses. But this approach is practical, English-oriented, not explicitly linked to psycholinguistic results, and does not fully 1 e.g. Frank (92) discusses the psycholinguistic relevance of adjunction for Children Language Acquisition, Joshi (90) discusses psycholinguistic results on crossed and serial dependencies. exploit "derivation" information. In this paper, we present 3 disambiguation principles which exploit derivation trees. 1. Brief presentation of LTAGs A LTAG consists of a finite set of elementary trees of finite depth. Each elementary tree must “anchor” one or more lexical item(s). The principal anchor is called “head”, other anchors are called “co-heads”. All leaves in elementary trees are either “anchor”, “foot node” (noted *) or “substitution node” (noted ↓).. These trees are of 2 types : auxiliary or initial2. A tree has at most 1 foot-node, such a tree is an auxiliary tree. Trees that are not auxiliary are initial. Elementary trees combine with 2 operations : substitution and adjunction. Substitution is compulsory and is used essentially for arguments (subject, verb and noun complements). It consists in replacing in a tree (elementary or not) a node marked for substitution with an initial tree that has a root of same category. Adjunction is optional (although it can be forbidden or made compulsory using specific constraints) and deals essentially with determiners, modifiers, auxiliaries, modals, raising verbs (e.g. seem). It consists in inserting in a tree in place of a node X an auxiliary tree with a root of same category . The descendants of X then become the descendants of the foot node of the auxiliary tree. Contrary to contextfree rewriting rules, the history of derivation must be made explicit since the same derived tree can be obtained using different derivations. This is why parsing LTAGs yields a derivation tree, from which a derived tree (i.e. constituent 2 Traditionally initial trees are called α, and auxiliary trees β tree) can be obtained. (Figure 1)3 . Branches in a derivation tree are unordered. Moreover, linguistic constraints on the wellformedness of elementary trees have been formulated : • Predicate Argument Cooccurence Principle : there must be a leaf node for each realized argument of the head of an elementary tree. • Semantic consistency : No elementary tree is semantically void • Semantic minimality : an elementary tree corresponds at most to one semantic unit 2. Former results on parsing preferences A vast literature addresses parsing preferences. Structural approaches introduced 2 principles : RA accounts for the preferred reading of the ambiguous sentence (a) : "yesterday" attaches to "left" and not to "said" (Kimball (73)). MA accounts for the preferred reading of (b) : "for Sue" attaches to "bought" and not to "flowers" (Frazier & Fodor (78)) (a) Tom said that Joe left yesterday (b)Tom bought the flowers for Sue These structural principles have been criticized though : Among other things, the interaction between these principles is unclear. This type of approach lacks provision for integration with semantics and/or pragmatics (Schubert (84)), does not clearly establish the distinction between arguments and modifiers (Ferreira & Clifton (86)) and is English-biased : evidence against RA has been found for Spanish (Cuetos & Mitchell (88)) and Dutch (Brysbaert & Mitchell (96)). Some parsing preferences are widely accepted, though : The idiomatic interpretation of a sentence is favored over its literal interpretation (Gibbs & Nayak (89)). Arguments are preferred over modifiers (Abney (89), Britt & al. (92)). 3 Our examples follow linguistic analyses presented in (Abeillé (91)), except that we substitute sentential complements when no extraction occurs. Thus we use no VP node and no Wh nor NP traces. But this has no incidence on the application of our preference principles. Additionally, lexical factors (e.g. frequency of subcategorization for a given verb) have been shown to influence parsing preferences (Hindle & Rooth (93)). It is striking that these three most consensual types of syntactic preferences turn out to be difficult to formalize by resorting only to "constituent trees" , but easy to formalize in terms of LTAGs. Before explaining our approach, we must underline that the examples4 presented later on are not necessarily counter-examples to RA and or MA, but just illustrations : our goal is not to further criticize RA and MA, but to show that problems linked to these "traditional" structural approaches do not automatically condemn all structural approaches. 3 Three preference principles based on derivation trees For sake of brevity, we will not develop the importance of "lexical factors", but just note that LTAGs are obviously well suited to represent that type of preferences because of strong lexicalization5. To account for the "idiomatic" vs "literal", and for the "argument" vs "modifier" preferences, we formulate three parsing preference principles based on the shape of derivation trees : 1. Prefer the derivation tree with the fewer number of nodes 2. Prefer to attach an α-tree low 6 3. Prefer the derivation tree with the fewer number of β-tree nodes Principle 1 takes precedence over principle 2 and principle 2 takes precedence over principle 3. 4 These examples are kept simple on purpose, for sake of clarity. 5 Also, "lexical preferences" and "structural preferences" are not necessarily antagonistic and can both be used for practical purpose. 6 By low we mean "as far as possible from the root". 3.1 What these principles account for Principle 1 accounts for the preference "idiomatic" over "literal": In LTAGs, all the set elements of an idiomatic expression are present in a single elementary tree. Figure 1 shows the 2 derivation trees obtained when parsing "Yesterday John kicked the bucket". The preferred one (i.e. idiomatic interpretation) has fewer nodes. FIGURE 17 Illustration of Principle 1 7 In derivation trees, plain lines indicate an adjunction, dotted lines a substitution. FIGURE 2 Illustration of Principle 2 α-John
منابع مشابه
Practical experiments in parsing using Tree Adjoining Grammars
We present an implementation of a chart-based head-corner parsing algorithm for lexicalized Tree Adjoining Grammars. We report on some practical experiments where we parse 2250 sentences from the Wall Street Journal using this parser. In these experiments the parser is run without any statistical pruning; it produces all valid parses for each sentence in the form of a shared derivation forest. ...
متن کاملA faster parsing algorithm for Lexicalized Tree-Adjoining Grammars
This paper points out some computational inefficiencies of standard TAG parsing algorithms when applied to LTAGs. We propose a novel algorithm with an asymptotic improvement, from to , where is the input length and are grammar constants that are independent of vocabulary size. Introduction Lexicalized Tree-Adjoining Grammars (LTAGs) were first introduced in (Schabes et al., 1988) as a variant o...
متن کاملAn improved Earley parser with LTAG
This paper presents an adaptation of the Earley algorithm (EARLEY, 1968) for parsing with lexicalized tree-adjoining grammars (LTAGs). This algorithm constructs the derivation tree following a top-down strategy and verifies the valid prefix property. Many earlier algorithm do not have both of this properties (ScHABES, 1994). The Earley-like algorithm described in (SCHABES and Josm, 1988) verifi...
متن کاملTransforming Dependency Structures to LTAG Derivation Trees
We propose a new algorithm for parsing Lexicalized Tree Adjoining Grammars (LTAGs) which uses pre-assigned bilexical dependency relations as a filter. That is, given a sentence and its corresponding well-formed dependency structure, the parser assigns elementary trees to words of the sentence and return attachment sites compatible with these elementary trees and predefined dependencies. Moreove...
متن کاملSome Experiments on Indicators of Parsing Complexity for Lexicalized Grammars
In this paper, we identify syntactic lexical ambiguity and sentence complexity as factors that contribute to parsing complexity in fully lexicalized grammar formalisms such as Lexicalized Tree Adjoining Grammars. We also report on experiments that explore the effects of these factors on parsing complexity. We discuss how these constraints can be exploited in improving efficiency of parsers for ...
متن کاملThings between Lexicon and Grammar
A number of grammar formalisms were proposed in 80’s, such as Lexical Functional Grammars, Generalized Phrase Structure Grammars, and Tree Adjoining Grammars. Those formalisms then started to put a stress on lexicon, and were called as lexicalist (or lexicalized) grammars. Representative examples of lexicalist grammars were Head-driven Phrase Structure Grammars (HPSG) and Lexicalized Tree Adjoi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007